Goto

Collaborating Authors

 signature transform



Path Signatures Enable Model-Free Mapping of RNA Modifications

Lemercier, Maud, Arrubarrena, Paola, Di Giorgio, Salvatore, Brettschneider, Julia, Cass, Thomas, Vries, Isabel S. Naarmann-de, Papavasiliou, Anastasia, Ruggieri, Alessia, Tellioglu, Irem, Wu, Chia Ching, Papavasiliou, F. Nina, Lyons, Terry

arXiv.org Machine Learning

Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.


Local regression on path spaces with signature metrics

Bayer, Christian, Gogolashvili, Davit, Pelizzari, Luca

arXiv.org Machine Learning

We study nonparametric regression and classification for path-valued data. We introduce a functional Nadaraya-Watson estimator that combines the signature transform from rough path theory with local kernel regression. The signature transform provides a principled way to encode sequential data through iterated integrals, enabling direct comparison of paths in a natural metric space. Our approach leverages signature-induced distances within the classical kernel regression framework, achieving computational efficiency while avoiding the scalability bottlenecks of large-scale kernel matrix operations. We establish finite-sample convergence bounds demonstrating favorable statistical properties of signature-based distances compared to traditional metrics in infinite-dimensional settings. We propose robust signature variants that provide stability against outliers, enhancing practical performance. Applications to both synthetic and real-world data - including stochastic differential equation learning and time series classification - demonstrate competitive accuracy while offering significant computational advantages over existing methods.


Supplementary material

Neural Information Processing Systems

Appendix B proves universal approximation of the Neural CDE model, and is substantially more technical than the rest of this paper. Appendix C proves that the Neural CDE model subsumes alternative ODE models which depend directly and nonlinearly on the data. Appendix D gives the full details of every experiment, such as choice of optimiser, hyperparameter searches, and so on. To evaluate the model as discussed in Section 3.2, X must be at least continuous and piecewise differentiable. A.1 Differentiating with respect to the time points However, there is a technical caveat in the specific case that derivatives with respect to the initial time t A.2 Adaptive step size solvers There is one further caveat that must be considered.


Learning non-Markovian Dynamical Systems with Signature-based Encoders

Pradeleix, Eliott, Hosseinkhan-Boucher, Rémy, Shilova, Alena, Semeraro, Onofrio, Mathelin, Lionel

arXiv.org Artificial Intelligence

Neural ordinary differential equations offer an effective framework for modeling dynamical systems by learning a continuous-time vector field. However, they rely on the Markovian assumption--that future states depend only on the current state--which is often untrue in real-world scenarios where the dynamics may depend on the history of past states. This limitation becomes especially evident in settings involving the continuous control of complex systems with delays and memory effects. To capture historical dependencies, existing approaches often rely on recurrent neural network (RNN)-based encoders, which are inherently discrete and struggle with continuous modeling. In addition, they may exhibit poor training behavior. In this work, we investigate the use of the signature transform as an encoder for learning non-Markovian dynamics in a continuous-time setting. The signature transform offers a continuous-time alternative with strong theoretical foundations and proven efficiency in summarizing multidimensional information in time. We integrate a signature-based encoding scheme into encoder-decoder dynamics models and demonstrate that it outperforms RNN-based alternatives in test performance on synthetic benchmarks. The code is available at: https://github.com/eliottprdlx/



SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology

Minchella, Paul, Verlingue, Loïc, Chrétien, Stéphane, Vaucher, Rémi, Metzler, Guillaume

arXiv.org Artificial Intelligence

Electronic medical reports (EHR) contain a vast amount of information that can be leveraged for machine learning applications in healthcare. However, existing survival analysis methods often struggle to effectively handle the complexity of textual data, particularly in its sequential form. Here, we propose SigBERT, an innovative temporal survival analysis framework designed to efficiently process a large number of clinical reports per patient. SigBERT processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings. To capture temporal dynamics from the time series of sentence embedding coordinates, we apply signature extraction from rough path theory to derive geometric features for each patient, which significantly enhance survival model performance by capturing complex temporal dynamics. These features are then integrated into a LASSO-penalized Cox model to estimate patient-specific risk scores. The model was trained and evaluated on a real-world oncology dataset from the Léon Bérard Center corpus, with a C-index score of 0.75 (sd 0.014) on the independent test cohort. SigBERT integrates sequential medical data to enhance risk estimation, advancing narrative-based survival analysis.


Hypergraphs on high dimensional time series sets using signature transform

Vaucher, Rémi, Minchella, Paul

arXiv.org Machine Learning

In recent decades, hypergraphs and their analysis through Topological Data Analysis (TDA) have emerged as powerful tools for understanding complex data structures. Various methods have been developed to construct hypergraphs -- referred to as simplicial complexes in the TDA framework -- over datasets, enabling the formation of edges between more than two vertices. This paper addresses the challenge of constructing hypergraphs from collections of multivariate time series. While prior work has focused on the case of a single multivariate time series, we extend this framework to handle collections of such time series. Our approach generalizes the method proposed in Chretien and al. by leveraging the properties of signature transforms to introduce controlled randomness, thereby enhancing the robustness of the construction process. We validate our method on synthetic datasets and present promising results.


Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Genet, Rémi, Inzirillo, Hugo

arXiv.org Artificial Intelligence

In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, \textit{Keras Sig} leverages the seamless integration with the mostly used deep learning backends such as PyTorch, JAX and TensorFlow. Inspired by Kidger and Lyons (2021),we proposed a novel approach reshaping signature calculations to leverage GPU parallelism. This adjustment allows us to reduce the training time by 55\% and 5 to 10-fold improvements in direct signature computation compared to existing methods, while maintaining similar CPU performance. Relying on high-level tensor operations instead of low-level C++ code, Keras Sig significantly reduces the versioning and compatibility issues commonly encountered in deep learning libraries, while delivering superior or comparable performance across various hardware configurations. We demonstrate through extensive benchmarking that our approach scales efficiently with the length of input sequences and maintains competitive performance across various signature parameters, though bounded by memory constraints for very large signature dimensions.


Sparse Signature Coefficient Recovery via Kernels

Shmelev, Daniil, Salvi, Cristopher

arXiv.org Machine Learning

Central to rough path theory is the signature transform of a path, an infinite series of tensors given by the iterated integrals of the underlying path. The signature poses an effective way to capture sequentially ordered information, thanks both to its rich analytic and algebraic properties as well as its universality when used as a basis to approximate functions on path space. Whilst a truncated version of the signature can be efficiently computed using Chen's identity, there is a lack of efficient methods for computing a sparse collection of iterated integrals contained in high levels of the signature. We address this problem by leveraging signature kernels, defined as the inner product of two signatures, and computable efficiently by means of PDE-based methods. By forming a filter in signature space with which to take kernels, one can effectively isolate specific groups of signature coefficients and, in particular, a singular coefficient at any depth of the transform. We show that such a filter can be expressed as a linear combination of suitable signature transforms and demonstrate empirically the effectiveness of our approach. To conclude, we give an example use case for sparse collections of signature coefficients based on the construction of N-step Euler schemes for sparse CDEs.